Experiments in Terabyte Searching, Genomic Retrieval and Novelty Detection for TREC 2004

نویسندگان

  • Stephen Blott
  • Fabrice Camous
  • Paul Ferguson
  • Georgina Gaughan
  • Cathal Gurrin
  • Gareth J. F. Jones
  • Noel Murphy
  • Noel E. O'Connor
  • Alan F. Smeaton
  • Peter Wilkins
  • Oisín Boydell
  • Barry Smyth
چکیده

In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Amberfish at the TREC 2004 Terabyte Track

The TREC 2004 Terabyte Track evaluated information retrieval in largescale text collections, using a set of 25 million documents (426 GB). This paper gives an overview of our experiences with this collection and describes Amberfish, the text retrieval software used for the experiments.

متن کامل

Finding New News: Novelty Detection in Broadcast News

The automatic detection of novelty, or newness, as part of an information retrieval system would greatly improve a searcher’s experience by presenting “documents” in order of how much extra information they add to what is already known instead of how similar they are to a user’s query. In this paper we present a novelty detection system evaluated on the AQUAINT text collection as part of our TR...

متن کامل

Experiments in Novelty Detection at Columbia University

This paper describes the method we used for the Novelty Track for the 2002 Text Retrieval Conference (TREC). We tried to adapt tools we are developing for a task closely related to the novelty part of the this track. The system we are building will scan a stream of documents and present to the user only the new information it finds. For the “relevance” part of the TREC, we decided to test the a...

متن کامل

Improved Feature Selection and Redundance Computing - THUIR at TREC 2004 Novelty Track

This is the third years that Tsinghua University Information Retrieval Group (THUIR) participates in Novelty task of TREC. Our research on this year’s novelty track mainly focused on four aspects: (1) text feature selection and reduction; (2) improved sentence classification in finding relevant information; (3)efficient sentence redundancy computing; (4) effective result filtering. All experime...

متن کامل

Experiments in TREC 2004 Novelty Track at CAS-ICT

The main task in Novelty Track is to retrieve relevant sentences and remove duplicates from a document set given a TREC topic. This track took place for the first time in TREC 2002 and it is refined to four tasks in TREC 2003. Besides 25 relevant documents, irrelevant ones are given in this year of Novelty track. In other words, a given document is either relevant or irrelevant to the topic. Th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004